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ABSTRACT 

Strong gravitational lens systems with extended sources are of special interest 
because they provide additional constraints on the models of the lens systems. To use 
a gravitational lens system for measuring the Hubble constant, one would need to 
determine the lens potential and the source intensity distribution simultaneously. A 
linear inversion method to reconstruct a pixellated source brightness distribution of a 
given lens potential model was introduced by Warren & Dye. In the inversion process, 
a regularisation on the source intensity is often needed to ensure a successful inversion 
with a faithful resulting source. In this paper, we use Bayesian analysis to determine 
the optimal regularisation constant (strength of regularisation) of a given form of 
regularisation and to objectively choose the optimal form of regularisation given a 
selection of regularisations. We consider and compare quantitatively three different 
forms of regularisation previously described in the literature for source inversions in 
gravitational lensing: zeroth-order, gradient and curvature. We use simulated data 
with the exact lens potential to demonstrate the method. We find that the preferred 
form of regularisation depends on the nature of the source distribution. 
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1 INTRODUCTION 

The use of strong gravitational lens systems to measure cos- 
mological parameters and to probe matter (including dark 
matter) is well known (e.g. Refsdal 1964; Kochanek, Schnei- 
der & Wambsganss 2006). Lens systems with extended 
source brightness distributions are particularly useful since 
they provide additional constraints for the lens modelling 
due to surface brightness conservation. In such a system, 
one would need to fit simultaneously the source intensity 
distribution and the lens potential model (or, equivalently 
the lens mass distribution) to the observational data. The 
use of a pixellated source brightness distribution has the 
advantage over a parametric source brightness distribution 
in that the source model is not restricted to a particular 
parameter space. Warren & Dye (2003) introduced a lin- 
ear inversion method to obtain the best-fitting pixellated 
source distribution given a lens model and the observational 
data. Several groups of people (e.g. Wallington, Kochanek & 
Narayan 1996; Treu & Koopmans 2004; Dye & Warren 2005; 
Koopmans 2005; Brewer & Lewis 2006) have used pixellated 
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source distributions, and some (Koopmans 2005; Suyu & 
Blandford 2006) even used a pixellated potential model for 
the lens. 

The method of source inversion described in Warren 
& Dye (2003) requires the source distribution to be "regu- 
larised" (i.e., smoothness conditions on the inverted source 
intensities to be imposed) for reasonable source resolutions. 1 
For fixed pixel sizes, there are various forms of regularisa- 
tion to use and the differences among them have not been 
addressed in detail. In addition, associated with a given form 
of regularisation is a regularisation constant (signifying the 



1 The source pixel sizes are fixed and are roughly a factor of the 
average magnification smaller than the image pixel sizes. In this 
case, regularisation is needed because the number of source pixels 
is comparable to the number of data pixels. On the other hand, 
if the number of source pixels is much fewer than the effective 
number of data pixels (taking into account of the signal-to-noise 
ratio), the data alone could be sufficient to constrain the pixel- 
lated source intensity values and regularisation would play little 
role. This is equivalent to imposing a uniform prior on the source 
intensity distribution (a prior on the source is a form of regulari- 
sation), a point to which we will return later in this article. 
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strength of the regularisation) , and the way to set this con- 
stant has been unclear. These two long-standing problems 
were noted in Kochanek et al. (2006). Our goal in this paper 
is to use Bayesian analysis to address the above two issues 
by quantitatively comparing different values of the regular- 
isation constant and the forms of regularisation. 

Brewer & Lewis (2006) also followed a Bayesian ap- 
proach for pixellated source inversions. The main difference 
between Brewer & Lewis (2006) and this paper is the prior 
on the source intensity distribution. Furthermore, this paper 
quantitatively compares the various forms of regularisation 
by evaluating the so-called "evidence" for each of the forms 
of regularisation in the Bayesian framework; Brewer & Lewis 
(2006) mentioned the concept of model comparison but did 
not apply it. 

Dye & Warren (2005) use adaptive source grids to avoid 
the use of explicit regularisation (i.e., uniform priors are im- 
posed since adapting the grids is an implicit form of regu- 
larisation); however, the Bayesian formalism would still be 
useful to set the optimal scales of the adaptive pixel sizes 
objectively. Furthermore, regularised source inversions (as 
opposed to unregularised - see footnote 1) permit the use of 
smaller pixel sizes to obtain fine structures. 

The outline of the paper is as follows. In Section 2, 
we introduce the theory of Bayesian inference, describing 
how to fit a model to a given set of data and how to rank 
the various models. In Section 3, we apply the Bayesian 
analysis to source inversions in strong gravitational lensing 
and show a way to rank the different forms of regularisations 
quantitatively. 



2 BAYESIAN INFERENCE 

We follow MacKay (1992) for the theory of Bayesian analy- 
sis, but use different notations that are more convenient for 
the application to gravitational lensing in Section 3. 

In Bayesian analysis, there are two levels of inference 
for data modelling. In the first level of inference, we choose 
a model and fit it to the data. This means characterising 
the probability distribution for the parameters of the model 
given the data. In the second level of inference, we want 
to rank the models quantitatively in the light of the data. 
By asking for the relative probabilities of models given the 
data, Bayesian analysis incorporates Occam's razor (which 
states that overly complex models should not be preferred 
over simpler models unless the data support them) in this 
second level of inference. The appearance of Occam's razor 
will be evident at the end of Section 2.2.1. In the following 
subsections, we will describe the two levels of inference in 
detail. 



2.1 Model fitting 

Let d be a vector of data points dj, where j = 1, . . . ,N d 
and iVd is the total number of data points. Let Si be the 
model parameters that we want to infer given the data, 
where i = 1, . . . ,N S and iV s is the number of parameters. 
Let f represent the response function that relates the model 
parameters to the measured data. (In the application of 
source reconstruction in gravitational lensing in Section 3, 
f encodes information on the lens potential, which is fixed 



in each iteration of source reconstruction.) For simplicity, 
consider f to be a constant linear transformation matrix of 
dimensions iY d -by-iV s such that 



d = fs + n 



(1) 



where n is the noise in the data characterised by the co- 
variance matrix Cd (here and below, subscript D indicates 
"data"). 

Modelling the noise as Gaussian, the probability of the 
data given the model parameters s is 



P(d\s,f) = 



where 



CX P (-gp(d|8,f)) 

Z D 



(2) 



E D (d\s,f) = i(fs-d) T C D 1 (fs-d) 

and Zy> = (27r) JVd/ ' 2 (det Cd) 1//2 is the normalisation for the 
probability. The probability P(d\s, f) is called the likelihood, 
and E-o(d\s,f) is half the standard value of \ 2 - in many 
cases, the problem of finding the most likely solution sml 
that minimizes Ed is ill-posed. This indicates the need to 
set a prior P(s\g,X) on the parameters s. The prior can 
be thought of as "regularising" the parameters s to make 
the prediction f s smooth. We can express the prior in the 
following form 



P(s\s, A) 



cxp(-Ag s (s|g)) 
ZsW 



(4) 



where A, the so-called regularisation constant, is the strength 
of regularisation and Zs(X) = j d Ns s exp(-XEs) is the nor- 
malisation of the prior probability distribution. The func- 
tion Es is often called the regularising function. We focus 
on commonly used quadratic forms of the regularising func- 
tion, and defer the discussion of other priors to Section 2.2.2. 
As we will see in Section 2.2.1, Bayesian analysis allows us 
to infer quantitatively the value of A from the data in the 
second level of inference. 

Bayes' rule tells us that the posterior probability of the 
parameters s given the data, response function and prior is 



P(s\d,\,f,g) = 



P(d\s,{)P(s\g,\) 

P(d\\,f,g) ' 



(5) 



where P(d|A,f, g) is the normalisation that is called the ev- 
idence for the model {A, f , g}. Since both the likelihood and 
prior are either approximated or set as Gaussians, the poste- 
rior probability distribution is also a Gaussian. The evidence 
is irrelevant in the first level of inference where we maximize 
the posterior (equation (5)) of parameters s to obtain the 
most probable parameters smp- However, the evidence is 
important in the second level of inference for model com- 
parisons. Examples of using the evidence in astronomical 



2 The Gaussian assumption is usually applicable to optical CCD 
data which have noise at each pixel characterised by dispersion 
Oj , the square root of the corresponding diagonal entry of the co- 
variance matrix. In general, there is correlation between adjacent 
pixels due to charge transfer (bleeding) and the drizzling process, 
which is characterised by the off-diagonal terms in the covariancc 
matrix. 
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context are Hobson, Bridle & Lahav (2002) and Marshall 
et al. (2002). 

To simplify the notation, let us define 

M{s) = E u {s) + XE s (s). (6) 

With the above definition, we can write the posterior as 

exp(-M(s)) 



P(s|d,A,f,g) = 



(7) 



Z M (X) ' 

where Zm(X) = j d N " s exp(— M (s)) is the normalisation. 

The most likely versus the most probable solution 

By definition, the most likely solution sml maximizes the 
likelihood, whereas the most probable solution smp maxi- 
mizes the posterior. In other words, 

Sml minimizes Ppj in 

equation (3) (V£d(*ml) 

minimizes M in equation (6) (VM(smp) = 0). 

Using the definition of the most likely solution, it is not 
difficult to verify that it is 



= 0, where V = and sn 



where 

F = f T C D _1 f 

and 

D = f T C^d 



(8) 



(9) 



(10) 



The matrix F is square with dimensions iV s x N s and the 
vector D has dimensions N B . 

The most probable solution smp can in fact be obtained 
from the most likely solution s ML - If the regularising func- 
tion Es is a quadratic functional that obtains its minimum 
at Srcg (i.e., V-Es(sreg) = 0), then we can Taylor expand En 
and Es to 

E u (s) = Ed(sml) + i(s - s M l) T B(s - sml) 
and 

Es(s) = Ps(s rcg ) 



(11) 



, 1, ,T 
+ „ (S s rcg 



(12) 



where B and C are Hessians of Ed and Es, respectively: 
B = VVP D (s) and C = VVP S (s). Equations (11) and (12) 
are exact for quadratic forms of En and Es with the Hessians 
B and C as constant matrices. For the form of Eo in equation 
(3), B is equal to F that is given by equation (9). We define 
A as the Hessian of M, i.e. A = VVM(s), and by equa- 
tion (6), A = B + AC. Using equations (6), (11), and (12) in 
VM(smp) = 0, we can get the most probable solution (that 
maximizes the posterior) as smp = A _1 (Bsml + ACs reg ). 
The simplest forms of the prior, especially the ones we will 
use for the gravitational lensing inversion in Section 3, have 
s r0 g = 0. In the case where s correspond to pixel intensity 
values, s rcg = implies a prior preference towards a blank 
image. The noise suppression effect of the regularisation fol- 
lows from this supplied bias. Focusing on such forms of prior, 
the most probable solution becomes 



smp 



= A _1 Bsn 



(13) 



This result agrees with equation (12) in Warren & Dye 
(2003). In fact, equation (13) is always valid when the reg- 
ularising function can be written as Es(s) = |s T Cs. 



Equation (13) indicates a one-time calculation of sml 
via equation (8) that permits the computation of the most 
probable solution smp by finding the optimal regularisation 
constant of a given form of regularisation. The parameters 
smp in equation (13) depend on the regularisation constant 
A since the Hessian A depends on A. Bayesian analysis pro- 
vides a method for setting the value of A, as described in the 
next subsection. 



2.2 Model comparison 

In the previous section, we found that for a given set of data 
d and a model (response function f and regularisation g 
with regularisation constant A), we could calculate the most 
probable solution smp for the particular A. In this section, 
we consider two main points: (i) how to set the regularisation 
constant A for a given form of regularisation g and (ii) how 
to rank the different models f and g. 



2.2.1 Finding A 

To find the optimal regularisation constant A, we want to 
maximize 

P(d|A,f,g)P(A) 



P(A|d,f,g) 



P(d\f,e) 



(14) 



using Bayes' rule. Assuming a flat prior in log A, the ev- 
idence P(d|A,f, g) which appeared in equation (5) is the 
quantity to consider for optimising A. 

Combining and rearranging equations (2), (4), (5), (6), 
and (7), we get 



P(d|A,f,g) = 



Z M (\) 
ZnZs(X)' 



(15) 



For quadratic functional forms of Es(s) with s rcg = 0, we 
have 

Z s (A) = e"^W ^^ 8/2 (detC) -i/ 2; (16) 

(17) 



Z M (A) = e - M(SMp) (2 7 r) JV = /2 (det A)- 1/2 , 
and recall 

Z D = (2^/ 2 (dctC D ) 1/2 . 



(18) 



Remembering that optimising a function is equivalent to 
optimising the logarithm of that function, we will work with 
log P(d\ A, f , g) to simplify some of the terms. Recalling that 
s rog = 0, by combining and simplifying equations (15) to 
(18), we have 

logP(d|A,f,g) = -APs(smp) -Pd(smp) 

- i log (dot A) + ^ log A + AP S (0) 
1 

+ -log(dctC)-^log(2^) 



+ ilog(det O. 



(19) 



In deriving equation (19) using equation (16), we implic- 
itly assumed that C, the Hessian of Es, is non-singular. The 

3 We use a flat prior that is uniform in log A instead of A because 
we do not know the order of magnitude of A a priori. 
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forms of regularisation we will use for gravitational lensing 
inversion in Section 3 have non-singular Hessians so that 
equation (19) is applicable. For the cases in which the Hes- 
sian is singular (i.e., at least one of the eigenvalues of the 
Hessian is zero), the prior probability distribution is uniform 
along the eigen-directions of the Hessian with zero eigenval- 
ues. The prior probability distribution will need to be renor- 
malised in the construction of the log evidence expression. 
The resulting log evidence expression can still be used to 
determine the optimal A in these cases because only the rel- 
ative probability is important and this normalising factor of 
the uniform prior, though infinite, will cancel in the ratios 
of probabilities. 

Solving 31 ^ T logP(d|A,f, g) = 0, we get the following 

equation for the optimal regularisation constant A: 

2A£s(s M p) = iV s -ATr(A- 1 C), (20) 

where Tr denotes the trace. Since smp and A depend on A, 
the above equation (20) is often nonlinear and needs to be 
solved numerically for A. 

For the reader's convenience, we reproduce the expla- 
nation in MacKay (1992) of equation (20). The equation 
is analogous to the (perhaps) familiar statement that \ 
roughly equals the number of degrees of freedom. Focusing 
on the usual case where Es(s rcs — 0) — and transform- 
ing to the basis in which the Hessian of Es is the identity 
(i.e., C = I), the left-hand side of equation (20) becomes 
2AJ5s(smp) = As^pSMP- This quantity can be thought of 
as the "xl of the parameters" if we associate A with the 
width (as) of the Gaussian prior: A = l/cr§. The left-hand 
side of equation (20) can be viewed as a measure of the 
amount of structure introduced by the data in the parame- 
ter distribution (relative to the null distribution of s rog = 0). 
Continuing the analogy, the right-hand side of equation (20) 
is a measure of the number of "good" parameters (where 
"good" here means well-determined by the data, as we ex- 
plain below). In the same basis where C = I, we can write 
the eigenvalues of A(= B + AC) as fj, a + A, where fj, a are the 
eigenvalues of B and index a = 1, . . . , N s . In this basis, the 
right-hand side, which we denote by 7, becomes 

^ Ha + A ^ H a + A 

a— 1 a— I 

For each eigenvalue of B, the fraction Ma . is a value be- 
tween and 1, so 7 is a value between and N s . If fj, a is much 
smaller than A, then the data are not sensitive to changes 
in the parameters along the direction of the eigenvector of 
fj, a - This direction contributes little to the value of 7 with 
M " "C 1, and thus it does not constitute as a good parame- 
ter. Similar arguments show that eigendirections with eigen- 
values much greater than A form good parameters. Therefore 
7, which is a sum of all the factors ^° x , is a measure of 
the effective number of parameters determined by the data. 
Thus, the solution to equation (20) is the optimal A that 
matches the xl of the parameters to the number of effective 
parameters. 

For a given form of regularisation Es(s), we are let- 
ting the data decide on the optimal A by solving equation 
(20). Occam's razor is implicit in this evidence optimisa- 
tion. For overly-small values of A, the model parameter space 
is overly-large and Occam's razor penalises such an overly- 



powerful model; for overly-large values of A, the model pa- 
rameter space is restricted to a limited region that the model 
can no longer fit to the data. Somewhere in between the two 
extremes is the optimal A that gives a model which fits to 
the data without being overly-complex. 

There is a shortcut to obtaining an approximate value of 
the optimal A instead of solving equation (20) (Bridle et al. 
1998). Given that 7 is a measure of the effective number 
of parameters, the classical number of degrees of freedom 
(NDF) should be Nd — 7- At the optimal A, we thus ex- 
pect Ed(smp) = \x 2 ~ \ {Nd — 7). Inserting this and the 
expression of \Es(s M p) from equation (20) into equation 
(6), we find that M(smp) ~ ^Nd- In other words, one can 
choose the value of A such that M evaluated at the resulting 
most probable parameters (smp) is equal to half the num- 
ber of data points. We emphasise that this will give only an 
approximate result for the optimal A due to the fuzzy asso- 
ciation of NDF with iVd — 7, but it may serve as a useful 
hack. 

2.2.2 Ranking models 

We can compare the different rcgularisations g and re- 
sponses f by examining the posterior probability of g and 
f: 

P(f,g|d)ocP(d|f,g)P(f,g). (22) 

If the prior P(f, g) is flat, then P(d\f, g) can be used to 
rank the different models and regularisations. We can write 
P(d|f,g) as 

P(d|f,g)= /p(d|f,g,A)P(A)dA, (23) 

where P(d\f, g, A) is precisely the evidence in equation (19). 

As seen in equation (23) above, the regularisation con- 
stant A is a nuisance parameter which invariably ends up 
being marginalised over. We might well expect the corre- 
sponding distribution for A to be sharply peaked, since we 
expect the value of A to be estimable from the data (as shown 
in Section 2.2.1); a particular value of A is preferred as a con- 
sequence of the balance between goodness of fit and Occam's 
razor. Consequently, we can approximate P(Ajd,f, g) by a 
delta function centred on the most probable constant, A. 
The model-ranking evidence P(d|f,g) in equation (23) can 
then be approximated by P(d|f,g, A) in equation (19). 

The approximation of using equation (19) to rank reg- 
ularsations is only valid if the Hessians of the different reg- 
ularising functions are non-singular. When the Hessian is 
singular, equation (19) will need to be modified to include 
a (infinite) normalisation constant that is regularisation de- 
pendent. The constants for different regularisation schemes 
generally will not cancel when one considers evidence ratios, 
thus prohibiting one from comparing different regularisation 
schemes. 

One can imagine there being much debate on the form 
of the prior P(f , g) that should be used. For example, some 
success has been achieved using maximum entropy meth- 
ods (e.g. Gull & Daniell 1978; Skilling 1989), whose prior 
form enforces positivity in the image and is maximally non- 
committal with regard to missing data. One practical prob- 
lem with using the entropic prior is its non-linearity. In this 
work we take a modern Bayesian view and argue that while 
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we will always have some a priori prejudice about the recon- 
structed image (for example, favouring zero flux, or insisting 
on positive images) , we would do well to try and learn from 
the data itself, assigning series of sensible priors and using 
the evidence to compare them quantitatively. In this con- 
text, we examine a small number of sensibly chosen priors 
(regularisation schemes) , and compute the evidence for each. 
We do not exhaustively seek the prior that maximizes the 
evidence, noting that this will change from object to ob- 
ject, and observation to observation. What we do provide 
is the mechanism by which prior forms can be compared, 
and demonstrate that good quality reconstructions can be 
obtained by optimising over our set of candidate priors. In 
Section 3.1, we discuss the various forms of prior that have 
been used in strong gravitational lensing. 



3 APPLICATION TO GRAVITATIONAL 
LENSING 

We apply the Bayesian formalism developed in the previ- 
ous section to source inversions in strong gravitational lens- 
ing. The process of finding the best-fitting pixellated source 
brightness distribution given a lens potential model and an 
observed image has been studied by, for examples, Walling- 
ton et al. (1996), Warren & Dye (2003), Treu & Koopmans 
(2004), Koopmans (2005), Dye & Warren (2005) and Brewer 
& Lewis (2006). The authors regularised the source inver- 
sion in order to obtain a smooth (physical) source intensity 
distribution. The forms of regularisation used in this paper 
are addressed in detail in Appendix A. In Section 3.1, we 
describe the Bayesian analysis of source inversions in grav- 
itational lensing. Sections 3.2 and 3.3 are two examples il- 
lustrating regularised source inversions. In both examples, 
we use simulated data to demonstrate for the first time the 
Bayesian technique of quantitatively comparing the differ- 
ent types of regularisation. Finally, Section 3.4 contains ad- 
ditional discussions based on the two examples. 

3.1 Regularised source inversion 

To describe the regularised source inversion problem, we fol- 
low Warren & Dye (2003) but in the Bayesian language. Let 
dj, where j = l,...,Nd, be the observed image intensity 
value at each pixel j and let Cd be the covariance matrix as- 
sociated with the image data. Let Si, where i = 1, . . . , N s , be 
the source intensity value at each pixel i that we would like 
to reconstruct. For a given lens potential and point spread 
function (PSF) model, we can construct the Na-by-N s ma- 
trix f that maps a source plane of unit intensity pixels to 
the image plane by using the lens equation (a practical and 
fast method to compute f is described in the appendices 
of Treu & Koopmans (2004), and an alternative method is 
discussed in Wallington et al. (1996)). We identify Eu with 
\x (equation (3)) and Es with the quadratic regularising 
function whose form is discussed in detail in Appendix A. 
The definitions and notations in our regularised source in- 
version problem are thus identical to the Bayesian analysis 
in Section 2 with data d and mapping matrix (response 
function) f. Therefore, all equations in Section 2 are im- 
mediately applicable to this source inversion problem, for 
example the most probable (regularised) source intensity is 



given by equation (13). We take as estimates of the 1-cr 
uncertainty on each pixel value the square root of the corre- 
sponding diagonal element of the source covariance matrix 
given by 

C s = A" 1 (24) 

(here and below, subscript S indicates "source"), where A 
is the Hessian defined in Section 2.1. Equation (24) differs 
from the source covariance matrix used by Warren & Dye 
(2003). We refer the reader to Appendix B for details on the 
difference. 

In summary, to find the most probable source given 
an image (data) d, a lens and PSF model f and a form 
of regularisation g, the three steps are: (i) find the most 
likely source intensity, sml (the unregularised source inver- 
sion with A = 0); (ii) solve equation (20) for the optimal A of 
the particular form of regularisation, where smp is given by 
equation (13); (iii) use equations (13) and (24) to compute 
the most probable source intensity and its 1-cr error with the 
optimal A from step (ii). 

Having found a recipe to compute the optimal A and the 
most probable inverted source intensity smp for a given form 
of regularisation g and a lens and PSF model f , we can rank 
the different forms of regularisation. For a given potential 
and PSF model f, we can compare the different forms of 
regularisation by assuming the prior on regularisation g to 
be flat and using equations (22), (23), and (19) to evaluate 
P(f,s\d). 

In this paper, we consider three quadratic functional 
forms of regularisation: zeroth order, gradient, and curva- 
ture (see Appendix A for details). These were used in War- 
ren & Dye (2003) and Koopmans (2005). The zeroth order 
regularisation tries to suppress the noise in the reconstructed 
source brightness distribution as a way to impose smooth- 
ness by minimizing the source intensity at each pixel. The 
gradient regularisation tries to minimize the gradient of the 
source distribution, which is equivalent to minimizing the 
difference in the source intensities between adjacent pixels. 
Finally, the curvature regularisation minimizes the curva- 
ture in the source brightness distribution. The two examples 
in the following subsections apply the three forms of regu- 
larisation to the inversion of simulated data to demonstrate 
the Bayesian regularised source inversion technique. 

Our choice of using quadratic functional forms of the 
prior is encouraged by the resulting linearity in the inver- 
sion. The linearity permits fast computation of the maximi- 
sation of the posterior without the risk of being trapped in 
a local maximum during the optimisation process. However, 
the quadratic functional forms may not be the most physi- 
cally motivated. For example, positive and negative values of 
the source intensity pixels are equally preferred, even though 
we know that intensities must be positive. Wallington et al. 
(1996) and Wayth et al. (2005) used maximum entropy 
methods that enforced positivity on the source brightness 
distribution. Such forms of the prior would help confine the 
parameter space of the source distribution and result in a 
perhaps more acceptable reconstruction. The disadvantage 
of using the entropic prior is its resulting non-linear inver- 
sion, though we emphasise that Bayesian analysis can still be 
applied to these situations to rank models. Another example 
is Brewer & Lewis (2006) who used priors suited for astro- 
nomical images that are mostly blank. This form of prior 



6 S. H. Suyu et al. 



also led to a non-linear system. In the following sections, 
we merely focus on quadratic forms of the prior because 
(i) it is computational efficiency, and (ii) we could obtain 
good quality reconstruction without considering more com- 
plex regularisation schemes. 

3.2 Demonstration 1: Gaussian Sources 

3.2.1 Simulated data 

As the first example to demonstrate the Bayesian approach 
to source inversion, we use the same lens potential and 
source brightness distribution as that in Warren & Dye 
(2003). The lens is a singular isothermal ellipsoid (SIE) at 
a redshift of z& — 0.3 with one-dimensional velocity disper- 
sion of 260 km s -1 , axis ratio of 0.75, and semi-major axis 
position angle of 40 degrees (from vertical in counterclock- 
wise direction). We use Kormann, Schneider & Bartelmann 
(1994) for the SIE model. We assume a fiat A-CDM universe 
with cosmological parameters of Q m = 0.3 and J1a = 0.7. 
The image pixels are square and have sizes 0.05" in each 
direction. We use 100 x 100 image pixels (N d = 10000) in 
the simulated data. 

We model the source as having two identical Gaussians 
with variance 0.05" and peak intensity of 1.0 in arbitrary 
units. The source redshift is z B = 3.0. We set the source pix- 
els to be half the size of the image pixels (0.025") and have 
30 x 30 source pixels (N s = 900). Fig.l shows the source 
in the left-hand panel with the caustic curve of the SIE po- 
tential. One of the Gaussians is located within the astroid 
caustic and the other is centred outside the caustic. 

To obtain the simulated data, we use the SIE lens model 
and the lens equation to map the source intensity to the 
image plane. We then convolve the resulting image with a 
Gaussian PSF whose FWHM is 0.08" and add Gaussian 
noise of variance 0.067 to the convolved image. For simplic- 
ity, the noise is uncorrelated, which is a good approximation 
to realistic noise with minimal charge transfer and drizzling. 
The right-hand panel of Fig. 1 shows the simulated data with 
the critical curve of the SIE model. 

3.2.2 Most likely inverted source 

We use the original SIE potential, PSF and Gaussian noise 
models of the simulated data for the source inversion to 
demonstrate the technique. 

The appendices of Treu & Koopmans (2004) describe a 
computationally efficient method to construct the f matrix. 
Following the method, we discretize the SIE potential to the 
100 x 100 grid and model the PSF on a 5 x 5 grid (which is 
a sufficient size since the 5x5 grid centred on the Gaussian 
PSF of FWHM 0.08" contains 99.99 per cent of the total 
intensity). Subsequently, for every image pixel j, we use the 
lens equation to trace to the source plane labelled by pixels 
i and interpolate to get the elements of unblurred f . Lastly, 
we multiply the unblurred f by the blurring (convolution) 
operator constructed from the 5x5 PSF model to get the 
full f matrix. With j = 1, . . . , Nd and i — 1, . . . , N s , the 
matrix f is large (10000 x 900) but fortunately sparse. 

In the right-hand panel of Fig. 1, the dotted lines on 
the simulated data mark an annular region where the image 
pixels map to the finite source plane. In other words, the 



image pixels within the dotted annulus correspond to the 
non-empty rows of the f matrix. The annular region thus 
marks the set of data that will be used for the source inver- 
sion process. 

With the f matrix and the data of simulated image in- 
tensities in the annulus, we can construct matrix F and vec- 
tor D using equations (9) and (10) 4 for the unregularised 
inversion (the most likely source intensity, in Bayesian lan- 
guage). We use UMFPACK 5 for sparse matrix inversions 
and determinant calculations. We compute the inverse of 
the matrix F and apply equation (8) to get the most likely 
source intensity. Using UMFPACK, the computation time 
for the inversion of F, a 900 x 900 matrix in this example, 
is only ~ 20 seconds on a 3.6 GHz CPU. Setting A = 
(implicit in A) in equation (24), we obtain the covariance 
matrix of the inverted source intensity and hence the 1-cr 
error and the signal-to-noise ratio. 

The top row of Fig. 2 shows the unregularised inverted 
source intensity in the left-hand panel, the 1-cr error of the 
intensity in the middle panel, and the signal-to-noise ratio 
in the right-hand panel. The unregularised inverted source 
intensity is smoother inside than outside the caustic curve 
because the source pixels within the caustic have additional 
constraints due to higher image multiplicities. The higher 
image multiplicities also explain the lower magnitude of the 
1-cr error inside the caustic curve. Despite the noisy re- 
construction especially outside the caustic curve, the two 
Gaussian sources have significant signal-to-noise in the right- 
hand panel. These results agree with Fig. 2 in Warren & Dye 
(2003). 

The bottom row of Fig. 2 shows the simulated data in 
the left-hand panel (from Fig. 1 for comparison purposes), 
the reconstructed data (from the most likely inverted source 
in the top left-hand panel and the f matrix) in the middle 
panel, and the residual (the difference between the simulated 
and reconstructed data) in the right-hand panel. The annu- 
lar region containing the data used for inversion is marked 
by dotted lines in the reconstructed and residual images. 
Visual inspection of the residual image shows that pixels in- 
side the annulus are slightly less noisy than those outside. 
This is due to over-fitting with the unregularised inversion. 
As we will see in the next subsection, Occam's razor that 
is incorporated in the Bayesian analysis will penalise such 
overly-powerful models. 

3.2.3 Most probable inverted source 

Having obtained the most likely inverted source, we can cal- 
culate the most probable source of a given form of regulari- 
sation with a given value of the regularisation constant A us- 
ing equation (13). In the remainder of this section, we focus 
on the three forms of regularisation (zeroth-order, gradient, 
and curvature) discussed in Appendix A. For each form of 
regularisation, we numerically solve equation (20) for the op- 
timal value of regularisation constant A using equation (13) 

4 The summations associated with the matrix multiplications in 
equations (9) and (10) are now summed over the pixels in the 
annulus instead of all the pixels on the image plane. 

5 a sparse matrix package developed by Timothy A. Davis, Uni- 
versity of Florida 
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Figure 1. Left-hand panel: The simulated Gaussian sources with peak intensities of 1.0 and FWHM of 0.05", shown with the astroid 
caustic curve of the SIE potential. Right-hand panel: The simulated image of the Gaussian sources (after convolution with Gaussian PSF 
and addition of noise, as described in the text). The solid line is the critical curve of the SIE potential, and the dotted lines mark the 
annular region where the source grid maps using the mapping matrix f. 



Table 1. The optimal regularisation constant for each of the three 
forms of regularisation for the inversion of two Gaussian sources. 
The log evidence, 7 (the right hand side of equation (20)), and 
the x 2 evaluated at the optimal regularisation constant are also 
listed. The number of data pixels in the annulus for inversion, 
iV annu i us , and three possible forms of constructing the reduced 
X 2 are shown. 



Regularisation 


zcroth-order 


gradient 


curvature 


A 


17.7 


34.2 


68.5 


logP(d|A,f,g) 


5086 


5367 


5410 


7 = N s ~ XTr(l\- 1 C) 


536 


287 


177 


x 2 = 2£ D 


3583 


3856 


4019 


N 1 

1 * annulus 


4325 


4325 


4325 


X /-^annulus 


0.83 


0.89 


0.93 


xV^annulus " N a ) 


1.05 


1.12 


1.17 


X 2 /(A r annulus-7) 


0.95 


0.95 


0.97 



for the values of smp ■ Table 1 shows the optimal regularisa- 
tion constant, A, for each of the three forms of regularisation. 
The table also includes the value of the evidence in equation 
(19) evaluated at A, which is needed for ranking the different 
forms of regularisation in the next subsection. 

Fig. 3 verifies the optimisation results for the gradi- 
ent form of regularisation. The evidence in dot-dashed lines 
(rescaled) is indeed a sharply-peaked function of A, justify- 
ing the delta-function approximation; the optimal regulari- 
sation constant A = 34.2 (listed in Table 1) is marked by the 
crossing point of the dashed and dotted lines, demonstrat- 
ing the balance between goodness of fit and simplicity of 
model that maximising the evidence achieves. The plots of 
equations (20) and (19) for zeroth-order and curvature reg- 
ularisations look similar to Fig. 3 and are thus not shown. 




X 



Figure 3. To demonstrate the A optimisation process, equations 

(19) and (20) are plotted as functions of A for the gradient reg- 
ularisation. The left-hand side and right-hand side of equation 

(20) are in dashed lines and dotted lines, respectively. The log 
evidence in equation (19) is shown in solid lines. The evidence, 
which as been rescaled to fit on the graph, is in dot-dashed lines. 
The left and right vertical axes are for equation (20) and (19), 
respectively. The crossing point of the left-hand side and right- 
hand side of equation (20) gives the optimal A, the position where 
the log evidence (hence evidence) obtains its maximum. 

In Table 1, we constructed three reduced \ 2 using the 

NDF as iVannuius, ^ annulus 

— N s , or iVannuius — 7) where 
-/Vannuius is the number of data pixels used in the inversion 
and recall N B is the number of source pixels reconstructed. 
In each of the three forms of regularisation, the reduced \ 2 
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Inverted Source 1— a Error of Inverted Source S/N Ratio of Inverted Source 




Simulated Data Reconstructed Data Residual 




0, / arcsec 1 / arcsec 6 1 / arcsec 

Figure 2. Unregularised inversion of Gaussian sources. Top left-hand panel: the most likely reconstructed source intensity distribution. 
The intensities outside the caustic curve of the potential model arc not well-reconstructed due to fewer constraints (lower image multi- 
plicities) outside the caustic curve. Top middle panel: the 1-a error of the inverted source intensity. The error is smaller inside the caustics 
due to additional multiple image constraints. Top right-hand panel: the signal-to-noise ratio of the inverted source intensity. The presence 
of the Gaussian sources is clear in this panel even though the reconstruction in the top left-hand panel is noisy. Bottom left-hand panel: 
the simulated data. Bottom middle panel: the reconstructed image using the most likely reconstructed source (top left-hand panel) and 
the f matrix from the potential and PSF models. Reconstructed data is confined to an annular region that maps on to the source plane. 
Bottom right-hand panel: the residual image obtained by subtracting the bottom middle panel from the bottom left-hand panel. The 
interior of the annular region is less noisy than the exterior, indicating that the unregularised reconstructed source is fitting to the noise 
in the simulated data. 



with NDF = iV ann ui us — 7 is closest to 1.0, which is the 
criterion commonly used to determine the goodness of fit. 
This supports our interpretation of the 7, the right-hand 
side of equation (20), as the number of "good" parameters 
determined by the data. The values of the reduced \ 2 is n °t 
strictly 1.0 because Bayesian analysis determines the opti- 
mal A by maximizing the evidence instead of setting the 
reduced \ 2 to 1.0. 

For each of the three forms of regularisation and its 
optimal regularisation constant listed in Table 1, we use 
equations (13) and (24) to obtain the most probable source 
intensity and its 1-a error. Fig. 4 shows the most probable 
source intensity (left-hand panels), the 1-a error (middle 
panels), and the signal-to- noise ratio (right-hand panels) for 
zeroth-order (top row), gradient (middle row) and curvature 
(bottom row) regularisations. The panels in each column are 
plotted on the same scales in order to compare the different 
forms of regularisation. The regularised inverted sources in 
the left-hand panels clearly show the two Gaussians for all 
three regularisations. Curvature regularisation results in a 
smoother source reconstruction than gradient regularisation 
which in turn gives smoother source intensities than zeroth- 



order regularisation. The 1-a errors in the middle column 
also indicates the increase in the smoothness of the source 
from zeroth-order to gradient to curvature regularisation due 
to a decrease in the error. This smoothness behaviour agrees 
with our claim in Appendix A that regularisations associ- 
ated with higher derivatives in general result in smoother 
source reconstructions. Since the error in the middle column 
decreases from the top to the bottom panel, the signal-to- 
noise of the source reconstruction increases in that order. 
Looking closely at the 1-a error in the middle column for 
gradient and curvature regularisations, the pixels in the left 
and bottom borders have larger error values. This can be ex- 
plained by the explicit forms of regularisation in equations 
(A2) and (A3). The pixels at the bottom and left borders are 
only constrained by their values relative to their neighbours, 
whereas the pixels at the top and right borders have addi- 
tional constraints on their values directly (last two terms in 
the equations). Visually, we observe that the source recon- 
struction with curvature regularisation matches the original 
source in Fig. 1 the best. In the next subsection, we will 
quantitatively justify that curvature regularisation is pre- 
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Figure 4. The regularised source inversions of Gaussian sources with zeroth-order, gradient and curvature regularisations. Top row, from 
left to right: most probable inverted source, the 1-rr error, and the signal-to-noise ratio with zeroth-order regularisation. Middle row, 
from left to right: same as top row but with gradient regularisation. Bottom row, from left to right: same as top row but with curvature 
regularisation. The panels in each column arc plotted on the same scales for comparison among the different forms of regularisation. 



ferred over gradient and zeroth-order regularisations in this 
example with two Gaussian sources. 

In Fig. 5, we show the reconstructed image and the 
image residual for the most probable inverted source with 
curvature regularisation. We omit the analogous figures for 
zeroth-order and gradient regularisations because they look 
very similar to Fig. 5. The left-hand panel is the simulated 
data in Fig. 1 that is shown for convenience for compar- 
ing to the reconstructed data. The middle panel is the re- 
constructed data obtained by multiplying the corresponding 
regularised inverted source in Fig. 4 by the f mapping matrix 



(only the pixels within the annulus [dotted lines] are recon- 
structed due to the finite source grid and PSF). The right- 
hand panel is the residual image, which is the difference be- 
tween the simulated and the reconstructed data. The slight 
difference among the reconstructed data of the three forms 
of regularisations is the amount of noise. Since the most 
probable inverted source gets less noisy from zeroth-order 
to gradient to curvature regularisation, the reconstructed 
data also gets less noisy in that order. The residual images 
of all three forms of regularisation look almost identical and 
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Figure 5. The image residual for curvature regularised source inversion with Gaussian sources. From left to right: simulated data, 
reconstructed data using the corresponding most probable inverted source in Fig. 4, and the residual equalling the difference between 
simulated and reconstructed data. The reconstructed data is restricted to the annulus marked by dotted lines that is mapped from the 
finite source grid using f . The noise in the residual image is more uniform compared to that of the unregularised inversion in Fig. 2. 



match the input (uniform Gaussian) noise, a sign of proper 
source reconstruction. 

In contrast to the residual image for the unregularised 
case in Fig. 2, the noise in the residual image in Fig. 5 is 
more uniform. This is Occam's razor in action - the presence 
of regularisation prevents the over-fitting to the noise within 
the annulus. For each form of regularisation, the value of A 
(Table 1) is optimal since it leads to the residual image in 
Fig. 5 having the input noise, which is uniform Gaussian 
noise in our example. If we over-regularise (i.e., use overly 
large A), then we expect the model to no longer fit to the 
data. This is shown in Fig. 6 which were obtained using cur- 
vature regularisation with A = 2000. The panels in the fig- 
ure are displayed in the same way as in Fig. 2. The inverted 
source (top left-hand panel) in Fig. 6 shows the smearing 
of the two Gaussian sources due to overly-minimized cur- 
vature among adjacent pixels. The resulting residual image 
(bottom right-hand panel) in Fig. 6 thus shows arc features 
that are not fitted by the model. However, note that the in- 
ferred signal-to-noise ratio in the source plane is very high; 
models that overly-regularise the source intensities give pre- 
cise (with small magnitudes for the error) but inaccurate 
results. Such overly-regularised models lead to low values 
of the evidence, which is the quantity to consider for the 
goodness of reconstruction. We seek an accurate reconstruc- 
tion of the source, and a signal-to-noise ratio that accurately 
reflects the noise in the data. The comparison among the 
unregularised, optimally regularised and overly-regularised 
inversions shows the power of the Bayesian approach to ob- 
jectively determine the optimal A (of a given form of reg- 
ularisation) that minimizes the residual without fitting to 
the noise. In the next subsection, we will see how Bayesian 
analysis can also be used to determine the preferred form of 
regularisation given the selection of regularisations. 



3.2.4 Optimal form of regularisation 

In the previous subsection, we showed how Bayesian analysis 
allowed us to determine objectively the optimal regularisa- 
tion constant for a given form of regularisation by maximiz- 
ing the evidence in equation (19). In this subsection we look 



for the optimal form of regularisation given the selection of 
regularisations. 

Since there is no obvious prior on the regularisation, 
we assume that the prior on the regularisation is flat. In 
this case, the different forms of regularisation is ranked 
by the value of P(d\f, g) in equation (23). Since the evi- 
dence P(d\f, g, A) is sharply peaked at A (as seen in Fig. 3), 
P(d\f, g) can be approximated by P(d\f, g, A). The values 
of the evidence P(d|f,g, A) in Table 1 indicate that the 
evidence for curvature regularisation is ~ e 43 and ~ e 324 
higher than that of gradient and zeroth-order regularisa- 
tions, respectively. Therefore, curvature regularisation with 
the highest evidence is preferred to zeroth-order and gra- 
dient for the two Gaussian sources. In quantitative terms, 
curvature regularisation is ~ e 43 more probable than gra- 
dient regularisation, which is ~ e 281 more probable than 
zeroth-order regularisation. This agrees with our comment 
based on Fig. 4 in Section 3.2.3 that visually, curvature reg- 
ularisation leads to an inverted source that best matches the 
original source of two Gaussians. 

The values of the reduced x 2 using NDF = -/V annu i us — 7 
in Table 1 show that curvature regularisation has the highest 
reduced \ 2 among the three forms of regularisation. The 
higher \ 2 value means a higher misfit due to fewer degrees of 
freedom (with more correlated adjacent pixels) in curvature 
regularisation. Nonetheless, the misfit is noise dominated 
since Fig. 5 shows uniform residual and the reduced \ 2 is 
~ 1.0. Therefore, the evidence optimisation is selecting the 
simplest model of the three regularisation schemes that fits 
to the data, enforcing Occam's razor. 

For general source brightness distributions, one may ex- 
pect that curvature regularisation with its complex struc- 
ture will always be preferred to the simplistic gradient and 
zeroth-order forms of regularisation. We show that this is 
not the case by considering the source inversion of a box 
source (region of uniform intensity) and two point sources 
as our next example. 
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Figure 6. Overly-regularised source inversion of Gaussian sources using curvature regularisation with A = 2000. Top row: the overly- 
regularised source shows smearing of the original two Gaussians (left-hand panel), the 1-cr error of the source intensity (middle panel), and 
the signal-to-noise ratio (right-hand panel). Bottom row: simulated data (left-hand panel), reconstructed data using the reconstructed 
source in the top left-hand panel and the f mapping matrix (middle panel), and the image residual showing arc features due to the 
overly-regularised inverted source (right-hand panel). 



3.3 Demonstration 2: box and point sources 

3.3.1 Simulated data 

To generate the simulated data of the box and point sources, 
we keep the following things the same as those in the exam- 
ple of two Gaussian sources: number of source pixels, source 
pixel size, number of image pixels, image pixel size, SIE po- 
tential model, and PSF model. The variance of the uniform 
uncorrelated Gaussian noise for the box and point sources is 
0.049, which leads to the same signal-to-noise ratio within 
the annular region as that in the two Gaussian sources. Fig. 7 
shows the box source and two point sources of unit intensi- 
ties with the caustic curves of the SIE in the left-hand panel, 
and the simulated image in the right-hand panel. 

We follow the same procedure as that in the previous 
example of two Gaussian sources to obtain the most likely 
inverted source, the most probable inverted source of a given 
form of regularisation, and the optimal form of regularisa- 
tion. Furthermore, we plot the results in the same format as 
that in the example of two Gaussian sources in Section 3.2. 



3.3.2 Most likely inverted source, most probable inverted 
source, and optimal form of regularisation 

Figs. 8 shows the most likely inverted source in the top row 
and the corresponding image residual in the bottom row. 



Similar to Fig. 2, the most likely inverted source in the top 
left-hand panel of Fig. 8 has poorly constrained pixels out- 
side the caustic curves due to lower image multiplicities. 
The residual image in the bottom right-hand panel of Fig. 8 
shows slight over-fitting to the noise inside the annulus. 

For regularised inversions, we solve equation (20) for 
the optimal regularisation constant for each of the three 
forms of regularisation. We list the optimal regularisation 
constants, A, and the associated log evidence evaluated at A 
in Table 2. Fig. 9 shows the most probable inverted source 
using the optimal regularisation constant in Table 2 for each 
of the three forms of regularisation. By visual inspection, 
the inverted source intensities (left-hand panels) with gra- 
dient regularisation matches the original source brightness 
distribution (Fig. 7) the best since curvature regularisation 
overly-smears the sharp edges and zeroth-order regularisa- 
tion leads to higher background noise. This is supported 
quantitatively by the values of the evidence in Table 2 with 
the highest value for gradient regularisation (which is ~ e 37 
more probable than curvature regularisation and ~ e 222 
more probable than zeroth-order regularisation). Again, this 
example illustrates that the signal-to-noise ratio does not de- 
termine the optimal regularisation - the right-hand panels of 
Fig. 9 show that curvature regularisation leads to the high- 
est signal-to-noise ratio, but the Bayesian analysis objec- 
tively ranks gradient over curvature! Finally, Fig. 10 shows 
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Table 2. The optimal regularisation constant for each of the three 
forms of regularisation for the inversion of box and point sources. 
The log evidence evaluated at the optimal regularisation constant 
is also listed. 



Regularisation 


zcroth-order 


gradient 


curvature 


A 


19.8 


21.0 


17.1 


logP(rf|A,f,g) 


6298 


6520 


6483 



the reconstructed image (middle panel) and the image resid- 
ual (right-hand panel) using the gradient regularisation. The 
corresponding plots for the zeroth-order and curvature reg- 
ularisations are similar and hence are not shown. 

3.4 Discussion 

3. 4-1 Preferred form of regularisation 

The two examples of source inversion considered in Sections 
3.2 and 3.3 show that the form of regularisation that is op- 
timally selected in the Bayesian approach depends on the 
nature of the source. Generally, with the three forms of reg- 
ularisation considered, curvature regularisation is preferred 
for smooth sources and gradient (or even zeroth-order) is 
preferred for sources with sharp intensity variations. In the 
two examples of source inversion, we found that at least 
one of the three considered forms of regularisation (which 
is not always the curvature form) allowed us to reconstruct 
successfully the original source in the inversion. Therefore, 
we did not need to consider other forms of regularisation. 
Nonetheless, this does not preclude other forms of regular- 
isation to be used. Even with additional types of regulari- 
sation, Bayesian analysis can always be used to choose the 
optimal one from the selection of forms of regularisation. 

3.4-2 Optimal number of source pixels 

So far, we have not discussed the size and the region of 
the source pixels to use. In both demonstration examples in 
Sections 3.2 and 3.3, we used source pixels that were half 
the size of the image pixels. In reality, one has to find the 
source region and the size of source pixels to use. 

The selection of the source pixel size for a given source 
region can be accomplished using Bayesian analysis in the 
model comparison step of Section 2.2.2 (the size of the source 
pixels is part of f since different source pixels sizes result in 
different matrices f). We find that source pixels sizes that 
are too large do not have enough degrees of freedom to fit to 
the data. On the other hand, source pixels that are too small 
will result in some source pixels being excluded in the f ma- 
trix (using the f construction method in Treu & Koopmans 
(2004)), which leads to a failure in the most likely source in- 
version since some pixels will be unconstrained. Therefore, 
for fixed pixel sizes over a source region (which our codes 
assume), the minimum source pixel size will be set by the 
minimum magnification over the source region. To improve 
the resolution in areas where there is more information, one 
would need to use adaptive grids. Dye & Warren (2005) have 
used adaptive grids in their source inversion routine, and we 
are also in the process of developing a code with adaptive 
gridding that will appear in a future paper. Our methods 



differ from that of Dye & Warren (2005) in that we follow a 
Bayesian approach and can thus quantitatively compare the 
forms of regularisation and the structure of source pixella- 
tion. 

At this stage, we cannot compare different source re- 
gions since the annular region on the image plane that maps 
to the source plane changes when the source region is al- 
tered. Recall that we only use the data within the annulus 
for source inversion. If the annular region changes, the data 
for inversion also changes. For model comparison between 
different data sets, we would need to know the normalisa- 
tion in equation (22), which we do not. Therefore, the best 
we can do in terms of source region selection is to pick a 
region that is large enough to enclose the entire luminous 
source, but small enough to not have the corresponding an- 
nular region exceeding the image region where we have data. 
Once the source region is selected, we can apply Bayesian 
analysis to determine the optimal source pixel size (subject 
to the minimum limit discussed above) and the optimal form 
of regularisation given the data. 



4 CONCLUSIONS AND FURTHER WORK 

We introduced and applied Bayesian analysis to the prob- 
lem of regularised source inversion in strong gravitational 
lensing. In the first level of Bayesian inference, we obtained 
the most probable inverted source of a given lens potential 
and PSF model f , a given form of regularisation g and an 
associated regularisation constant A; in the second level of 
inference, we used the evidence P(d|A,f, g) to obtain the 
optimal A and rank the different forms of regularisation, as- 
suming flat priors in A and g. 

We considered three different types of regularisation 
(zeroth-order, gradient and curvature) for source inversions. 
Of these three, the preferred form of regularisation depended 
on the intrinsic shape of the source intensity distribution: in 
general, the smoother the source, the higher the derivatives 
of the source intensity in the preferred form of regularisa- 
tion. In the demonstrated examples of first two Gaussian 
sources, and then a box with point sources, we optimised the 
evidence P(d\X, f , g) and numerically solved for the regular- 
isation constant for each of the three forms of regularisation. 
By comparing the evidence of each regularisation evaluated 
at the optimal A, we found that the curvature regularisation 
was preferred with the highest value of evidence for the two 
Gaussian sources, and gradient regularisation was preferred 
for the box with point sources. 

The study of the three forms of regularisation demon- 
strated the Bayesian technique used to compare different 
regularisation schemes objectively. The method is general, 
and the evidence can be used to rank other forms of reg- 
ularisation, including non-quadratic forms (e.g. maximum 
entropy methods) that lead to non-linear inversions (e.g. 
Wellington et al. 1996; Wayth et al. 2005; Brewer & Lewis 
2006). We restricted ourselves to linear inversion problems 
with quadratic forms of regularising function for computa- 
tional efficiency. 

In the demonstration of the Bayesian technique for reg- 
ularised source inversion, we assumed Gaussian noise, which 
may not be applicable to real data. In particular, Poisson 
noise may be more appropriate for real data, but the use 
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Original Source Simulated Data 




Figure 7. Left-hand panel: The simulated box and point sources with intensities of 1.0, shown with the astroid caustic curve of the 
SIE potential. Right-hand panel: The simulated image of the box and point sources (after convolution with Gaussian PSF and addition 
of noise as described in the text). The solid line is the critical curve of the SIE potential and the dotted lines mark the annular region 
where the source grid maps using the f mapping matrix. 



of Poisson noise distributions would lead to non-linear in- 
versions that we tried to avoid for computational efficiency. 
Nonetheless, the Bayesian method of using the evidence to 
rank the different models (including noise models) is still 
valid, irrespective of the linearity in the inversions. 

We could also use Bayesian analysis to determine the 
optimal size of source pixels for the reconstruction. The 
caveat is to ensure that the annular region on the image 
plane where the source plane maps is unchanged for differ- 
ent pixel sizes. Currently the smallest pixel size is limited 
by the region of low magnifications on the source plane. In 
order to use smaller pixels in regions of high magnifications, 
adaptive source gridding is needed. This has been studied 
by Dye & Warren (2005), and we are currently upgrading 
our codes to include this. 

The Bayesian approach can also be applied to poten- 
tial reconstruction on a pixellated potential grid. Bland- 
ford, Surpi & Kundic (2001) proposed a method to per- 
turbatively and iteratively correct the lens potential from 
a starting model by solving a first order partial differen- 
tial equation. This method has been studied by Koopmans 
(2005) and Suyu & Blandford (2006). The perturbation dif- 
ferential equation can be written in terms of matrices for 
a pixellated source brightness distribution and a pixellated 
potential, and the potential correction of each iteration can 
be obtained via a linear matrix inversion. This pixellated po- 
tential reconstruction is very similar to the source inversion 
problem and we are currently studying it in the Bayesian 
framework. 

The Bayesian analysis introduced in this paper is gen- 
eral and was so naturally applicable to both the source and 
potential reconstructions in strong gravitational lensing that 
we feel the Bayesian approach could be useful in other prob- 
lems involving model comparison. 
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Figure 8. Unregularised source inversion of box and point sources. Top left-hand panel: the most likely reconstructed source intensity 
distribution. The intensities outside the caustic curve of the potential model are not well-reconstructed due to fewer constraints (lower 
image multiplicities) outside the caustic curve. Top middle panel: the l-cr error of the inverted source intensity. The error is smaller inside 
the caustics due additional multiple image constraints. Top right-hand panel: the signal-to-noise ratio of the inverted source intensity. 
Bottom left-hand panel: the simulated data. Bottom middle panel: the reconstructed image using the most likely reconstructed source 
(top left-hand panel) and the f matrix from the potential and PSF models. Reconstructed data is confined to an annular region that 
maps on to the source plane. Bottom right-hand panel: the residual image obtained by subtracting the bottom middle panel from the 
bottom left-hand panel. The interior of the annular region is less noisy than the exterior, indicating that the reconstructed image is 
fitting to the noise in the simulated data. 
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APPENDIX A: FORMS OF REGULARISATION 

We consider the three most common quadratic functional 
forms of the regularisation found in the local literature: 
"zeroth-order," "gradient," and "curvature" (Press et al. 
1992, §18.4 and §18.5). For clarity reasons, we use explicit 
index and summation notation instead of vector and ma- 



trix notation for the expression of the regularising function 
E s (s). 

Zeroth-Order regularisation is the simplest case. The 
functional form is 

N s 

E s ( S ) = ±J^sl (Al) 

i=l 

and its Hessian is the identity operator C = I. This form of 
regularisation tries to minimize the intensity at every source 
pixel as a way to smooth the source intensity distribution. It 
introduces no correlation between the reconstruction pixel 
values. 

To discuss gradient and curvature forms of regularisa- 
tion, we label the pixels by their x and y locations (i.e., have 
two labels (ii, 12) for each pixel location instead of only one 
label (i) as in Section 3.1) since the mathematical struc- 
ture and nomenclature of the two forms of regularisation 
are clearer with the two-dimensional labelling. Let s ilii2 be 
the source intensity at pixel (11,12), where ii and i-i range 
from i\ = 1, . . . , iVi s and 12 = 1, ... , Ar 2s . The total number 
of source pixels is thus N s = 7Vi s 7V2 S . It is not difficult to 
translate the labelling of pixels on a rectangular grid from 
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Figure 9. The regularised source inversions of box and point sources with zeroth-order, gradient and curvature regularisations. Top row, 
from left to right: most probable inverted source, the 1-a error, and the signal-to-noise ratio with zeroth-order regularisation. Middle row, 
from left to right: same as top row but with gradient regularisation. Bottom row, from left to right: same as top row but with curvature 
regularisation. The panels in each column arc plotted on the same scales for comparison among the different forms of regularisation. 



two dimensions to one dimension for Bayesian analysis. For 
example, one way is to let i = ii + (12 — l)iV2 S . 
A form of gradient regularisation is 



JVi s -l N 2b 



1 v ^ ^ 

E s(s) = j E ^2 ^ Sil ' i2 ~ Si i + 1 '^Y 

il = l 12 — 1 
N le N 2s -1 



, N ls JV 2a 

1 \ 2 1 \ 2 



(A2) 



n = 1 ^2—1 



The first two terms are proportional to the gradient values of 
the pixels, so this form of regularisation tries to minimize the 
difference in the intensity between adjacent pixels. The last 
two terms can be viewed as gradient terms if we assume that 
the source intensities outside the grid are zeros. Although 
the non-singularity of the Hessian of _Es is not required for 
equation (13) since equation (A2) is of the form Es(s) = 
is T Cs, these last two terms ensure that the Hessian of Es 



16 S. H. Suyu et al. 



Simulated Data Reconstructed Data Residual 




fi, / arcsec 0, / arcsec 8^ / arcsec 

Figure 10. The image residual for gradient regularised source inversion with box and point sources. From left to right: simulated data, 
reconstructed data using the corresponding most probable inverted source in Fig. 9, and the residual equalling the difference between 
simulated and reconstructed data. The reconstructed data is restricted to the annulus marked by dotted lines that is mapped from the 
finite source grid using f . The noise in the residual image is more uniform compared to that of the unregularised inversion in Fig. 8. 



is non-singular and lead to s rog = 0. The non-singularity of 
the Hessian of Es (i.e., detC / 0) is crucial to the model 
comparison process described in Section 2.2.2 that requires 
the evaluation of the log evidence in equation (19). 
A form of curvature regularisation is 

JV ls -2 AT 2s 

E s(s) = - ^ ^ [si 1|i2 - 2a il+ i, i2 + s il+ 2,i 2 } 2 

il = l 12—1 
N le N 2b -2 

+ 2 51 X/ ' Sil ' i2 ~~ 2s n> i 2 + i + Si 1 , l2+ 2} 2 

+ 2 5Z I^l'^s- 1 ~ S H,iV 2a ] 2 
»1=1 

1 N2 " 

+ 2 51 l 5 ^^" 1 '^ ~ s N la ,i2] 2 
12=1 

1^2 1 N " 

+ 2 51 s ii. N 2s + 2 51 s Ni B ,i2- ( A3 ) 

i 2 = l 



il=l 



The first two terms measure the second derivatives (curva- 
ture) in the x and y directions of the pixels. The remaining 
terms are added to enforce our a priori preference towards 
a blank image with non-singular Hessian (important for the 
model ranking) that gives s Icg = 0. In essence, the majority 
of the source pixels have curvature regularisation, but two 
sides of the bordering pixels that do not have neighbouring 
pixels for the construction of curvature terms have gradient 
and zeroth-order terms instead. 

It is not difficult to verify that all three forms of regu- 
larisation have s rog = in the expansion in equation (12). 
Therefore, equation (13) for the most probable solution is 
applicable, as asserted in Section 3.1. 

None of the three forms of regularisation impose the 
source intensity to be positive. In fact, equations (Al) to 
(A3) suggest that the source intensities are equally likely to 
be positive or negative based on only the prior. 

In principle, one can continue the process and construct 
regularisations of higher derivatives. Regularisations with 
higher derivatives usually imply smoother source reconstruc- 



tions, as the correlations introduced by the gradient opera- 
tor extend over larger distances. Depending on the nature 
of the source, regularisations of higher derivatives may not 
necessarily be preferred over those of lower derivatives: as- 
tronomical sources tend to be fairly compact. Therefore, we 
restrict ourselves to the three lowest derivative forms of the 
regularisation for the source inversion problem. 



APPENDIX B: EXPLANATION OF THE 
SOURCE COVARIANCE MATRIX IN 
BAYESIAN ANALYSIS 

Notation 

Expressed in terms of matrix and vector multiplications, re- 
call equation (1) for the image intensity vector is 



d — fs + n, 



(Bl) 



where f is the lensing (response) matrix, s is the source 
intensity vector and n is the noise vector. Recall equation 
(3) is 

J B D (s) = i(f S -d) T C D 1 (f S -d), (B2) 

where Cd = (nn T ) is the image noise covariance matrix. 
We write the prior exponent as 



\Es(s) = ^s T S- 1 s, 



(B3) 



where, for simplicity, we have set s rcg = and Es(0) = 
(valid for the regularisation schemes considered in Appendix 
A), and S = (ss T ) is the a priori source covariance matrix. 
Comparing to equation (12), S = (AC) -1 . Combining equa- 
tions (B2) and (B3), the exponent of the posterior is 

M(s) = E D (s)+XEs{s) 

= i(f s -d) T C D 1 (fs-d) + i S T S- 1 s. (B4) 



Most likely estimate 

The most likely estimate is given by V£d(«ml) = 0, which 
gives 
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f T CD 1 (fs M L-d) = o. 



(B5) 



Rearranging the previous equation, we obtain 

SMi J = (f T Cv 1 f)- 1 f T Cv 1 d. (B6) 

Differentiating E D (s) again gives the Hessian 

B = VV£ D (s) =f T C D 1 f. (B7) 

This in turn allows us to write 

sml = B^Cf^d, (B8) 

which is equation (8). 

By construction, Cd, S, and B are symmetric matrices. 

Error on most likely estimate 

Let us assume that the true source intensity is s* (i.e. the ac- 
tual true source intensity for the particular image we are con- 
sidering). Now consider the expectation value of sml over 
realisations of the noise n: 

(sml) = B- 1 f T Cn 1 (fs, + n) = B^Cf^s, = s„, (B9) 

where we have used (n) = and angle brackets denote 
averages over noise realisations. Thus, we see that sml is an 
unbiassed estimator of s». 

Now consider the covariance of sml- Since (sml) = s», 
the covariance is given by 

((■sml — «.)(«ml — S *) T ) = (*mlSml) + s*sj 

-««(«ml) - (sml)sJ 
= (smls M l)-S». (BIO) 

where S* = s*s^ is the covariance matrix of the true signal 
and, once again, angle brackets denote averages over noise 
realisations. The term (sml«ml) above is given by 

(smlSml) = B _1 f T C I5 1 (rfd T )C E) 1 fB _i 

= B" 1 f T C D 1 ((fs» + n)(fs» + n) T >C D 1 fB" 1 

= B- 1 f T C D 1 (fs»sJf T + C D )C D 1 fB- 1 

= B _1 BS»BB 1 + B _1 BB _1 

= S. + B" 1 . (Bll) 

Inserting equation (Bll) in (BIO), the covariance of sml is 
given simply by 

T > = B"\ 



S.)( S ML — S, 



(B12) 



which agrees with equation (24) since A = B for the most 
likely solution (with A = 0). 

Most probable estimate 

The most probable estimate is given by VM(smp) = 0, 
which gives 

f^Cf^fsMP - d) + S _1 « M p = 0. (B13) 
Rearranging, we get 

smp = (S- 1 + f T C5 1 f)- 1 f T C5 1 d. (B14) 



Differentiating M(s) again gives the Hessian 
A ee VVM(s) = S 1 + f^C^f = S 1 + B, 
which, in turn, allows us to write 



(B15) 



smp = A-^C^d = A _1 BB _1 f T Cp 1 d = A _1 Bs M l,(B16) 

which agrees with equation (13). 

The Hessian A is symmetric by construction. 



Error on MP estimate 

Let us again assume that the true source intensity is s*. 
Using equations (B16) and (B9), the expectation value of 
smp over realisations of the noise n is 



(smp) = A 'B(sml) = A 1 Bs», 



(B17) 



where angle brackets denote averages over noise realisations. 
Thus, we see that smp is a biassed estimator (in general) of 
s*. We must therefore be careful when considering errors. 
First consider the covariance of smp, which is given by 



((smp - (smp))(smp — (smp)) ) 



(B18) 



where we have used equations (B16), (B17) and (Bll). Re- 
membering that A = S _1 + B, we have B = A S _1 , so the 
final result is 

((smp - (smp))(smp - (smp)) T ) = A -1 — A _1 S _1 A _1 ,(B19) 

which is equivalent to the equation (17) in Warren & Dye 
(2003). 

We verified equation (B19) by a Monte Carlo simulation 
of 1000 noise realisations of the source brightness distribu- 
tion described in Section 3.2.1. The noise realisations differ 
only in the values of the random seed used to generate ran- 
dom noise in the simulated data. We used curvature regular- 
isation (see Appendix A) with a fixed (and nearly optimal) 
value of the regularisation constant A for each of the 1000 
source inversions. The standard deviation of smp calculated 
from the 1000 inverted source distributions agrees with the 
1-cr error from equation (B19). 

Equation (B19) gives the error from the reconstructed 
source smp- Since smp is a biassed estimator of s*, what 
we really want to know is not the covariance above, but 
the quantity ((smp — s*)(smp — s„) T ), which gives us the 
distribution of errors from the true source. This is given by 

((smp - s*)(s M p - s*) T ) = A ^BS.BA 1 + A : BA 1 

+S» S»BA 1 
-A _1 BS,, (B20) 

where we have again used equations (B16), (B17) and (Bll). 
Substituting B = A S 1 gives, after simplifying, 



((smp — s*)(smp — s,) 1 



= A : +A _1 S 1 
(S.S" 1 - l)A _1 



(B21) 



In reality, we do not know S* (as this would require knowing 
the true source intensity s„). However, by averaging over 
source brightness distributions (denoted by a bar), we have 
S» = S. This is the manifestation of our explicit assumption 
that all source intensity distributions are drawn from the 
prior probability density defined by equation (4). Thus, 



((smp - s«)(smp - s„) T ) = A 1 , 



(B22) 



which is the inverse of VVM(s). In words, the covariance 
matrix describing the uncertainties in the inverted source 
intensity is given by the width of the approximated Gaus- 
sian posterior in equation (7), which is A -1 . The covariance 
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matrix of smp in equation (B19) in general under-estimates 
the error relative to the true source image because it does 
not incorporate the bias in the reconstructed source. 

This paper has been typeset from a TffjX/ WFftK file prepared 
by the author. 



